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1 i . (Original) A method for recovering from failures affecting a resource manager within a 

2 group of resource managers, wherein the resource managers within the group have access to a 

3 shared resource via which remote resource managers communicate with the resource managers 

4 within the group, the shared resource including data storage structures to which resource 

5 managers within said group connect to send and receive communications;, the method 

6 comprising; 

v 7 storing, within a first data storage structure of the shared resource, unit of work 

8 descriptors for operations performed in relation to said shared resource by the resource managers 

9 in said group; 

10 sending a notification of a connection failure between a second data storage structure of 

1 1 the shared resource and a first resource manager within said group, the notification being sent to 

1 2 the remaining resource managers within the group which are connected to the second data 

1 3 storage structure; 

14 one or more of said remaining resource managers accessing said first data storage 

1 5 structure and analysing the unit of work descriptors to identify the units of work relating to the 

1 6 second data storage structure that were being performed by the first resource manager when the 

1 7 connection failure occurred; and 

1 8 said one or more remaining resource managers recovering the identified units of work. 

1 2. (Original) A method according tu clai m 1 wherein, if there are no remaining resource 

2 managers connected to the second data storage structure after said connection failure, said 
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3 notification is sent to a remaining resource manager whet* that resource manager connects to the 

4 second data storage structure. 

1 3,(QriginaI) A method according to claim 1 wherein, if there are no remaining resource 

2 managers connected to the second data storage structure after said connection failure, the failed 

3 resource manager determines when it is restarted whether any other resource manager has 

4 performed recovery for its units of work relating to the second data storage structure and, upon 

5 determining that no resource manager has performed said recovery* the restarted resource 

6 manager recovers said units of work> 

1 4. (Original) A method according to claim 1, wherein all remaining resource managers within 

2 the group which are connected to the second data storage structure respond to said notification by 

3 attempting to access said first data storage structure to identify units of work to rewver 5 and the 

4 method includes the further steps of: 

5 responsive to a first remaining resource manager identifying a unit of work to recover, 

6 said first remaining resource manager attempting to set a flag for said unit of work; 

7 responsive to successtiily setting said flag, assigning recovery responsibility for said unit 

8 of work to said first remaining resource manager; an d 

9 refusing to assign recovery responsibility for said unit of work to said first remaining 

1 0 resource manager if said flag has been set by another remaining resource manager. 

1 5. (Original) A method according to claim 4, including the further step of: 
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2 responsive to said flag having been set by another remaining resource manager, said first 

3 remaining resource manager attempting to identify a further unit of work to recover and 

4 attempting to set a flag for said identified further unit of work. 

1 6.(Original) A method according to claim 4 ? including the following steps in response to a 

2 connection failure between the second data storage structure of the shared resource and said first 

3 remaining resource manager during recovery of said unit of work: 

4 sending a notification of said connection failure to the remaining resource managers 

5 within the group which are connected to the second data storage structure; 

6 one or more of said remaining resource managers accessing said first data storage 

7 structure and analysing the unit of work descriptors to identify llie units o f work relating to the 

8 second data storage structure that were being performed by the first remaining resource manager 

9 when the connection failure occurred; and 

1 0 said one or more remaining resource managers recovering the identified units of work 

1 7. (Original) A method according to claim 1, wherein the unit of work descriptors include: 

2 a unit of work identifier; 

3 an identification of messages put or retrieved within the unit of work; 

4 a status for the unit of work; and 

5 a sequence number. 
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1 8. (Original) A method according to claim 1 , wherein the shared resource is a coupling facility 

2 list structure, the second data storage structure is a coupling facility list structure in which a 

3 coupling facility list header represents a shared access message queue, and the first data storage 

4 structure is an administration list structure of the coupling facility for storing unit of work 

5 descriptors. 

1 9. (Original) A method according to claim 8, including storing within the coupling facility, for 

2 each resource manager within the group, a list header information map representing the set of 

3 shared access message queues within the second data storage sir uc Lure for which the resource 

4 manager has performed some work. 

1 10. (Original) A method according to claim 9, including reading said list header information 

2 map during recovery to identify the set of shared access message queues within the second data. 

3 storage structure for which the failed resource manager has performed some work. 

1 11. (Original) A method according to claim 1 , including storing within the shared resource a 

2 structure interest map identifying the set of data storage structures to which respective resource 

3 managers within said group are connected. 

1 12. (Original) A method according to claim 1 1, wherein the step of recovering the identified 

2 units of work is a first recovery phase and wherein the method includes a second recovery phase 

3 comprising the steps of: 

4 reading the structure interest map for the failed resource manager to identify the set of 

5 data storage structures to which the failed resource manager was connected at the time of said 

6 connection failure; 
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7 identifying any operations performed by the failed resource manager on said set of data 

8 storage structures which were not recovered in the first recovery phase; and 

9 one or more of said remaining resource managers then backing out said unrecnvarad 

1 0 operations. 

1 13, (Original) A method according to claim 1 2, wherein the method includes setting a key for 

2 operations performed in relation to the shared resource, the key identifying the resource manager 

3 which performed the operation, and wherein the identification of operations performed by the 

4 failed resource manager comprises checking aaid keys for unrccovercd operations performed in 

5 relation to any of said set of data storage structures* 

1 14, (Original) A method accordixxg to claim 1, wherein a single unit of work represented by a 

2 unit of work descriptor may include operations performed in relation to a plurality of data storage 

3 structures, and wherein the partial units of work corresponding to said operations are recovered 

4 by different ones of said remaining resource managers wi thin the group. 

1 15. (Original) A method for recovering from failures affecting a resource manager within a 

2 group of resource managers, wherein the resource managers within the group have access to a 

3 shared resource, fhfi shared resource including Hnta storage structures to which resource managers 

4 within said group connect to perform operations in relation to data held in said shared resource, 

5 ' the method comprising: 

6 storing, within a first data storage structure of the shared resource, unit of work 

7 descriptors for operations performed by the resource managers in said group in relation to data 

8 held in said shared resource; 
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9 sending a notification of a connection failure between a second data storage structure of 

1 0 the shared resource and a first resource manager within said group, the notification being sent to 

11 the remaining resource managers within the group which are connected to the second data 

12 storage structure; 

1 3 one or more of said remaining resource managers accessing said first data storage 

14 structure and analysing the unit of work descriptors to identify the units of work relating to the 

1 5 second data storage structure that were being performed by the first resource manager when the. 

1 6 connection failure occurred; and 

1 7 said one or more remaining resource managers recovering the identified units of work. 

1 16. (Original) A mtslhod according Lo claim 15, wherein the data storage structures of said 

2 shared resource include data storage structures which contain shared message queues and said 

3 operations performed in relation to said shared resource include putting messages onto a shared 

4 message queue and retrieving messages from a shared message queue, for communication 

5 between a remote resource manager and resource managers within said group, . 

1 17* (Original) A method according to claim 16, wherein the unit of work descriptors include: 

2 a unit of work identifier; 

3 an identification of messages put or retrieved within the unit of work; 

4 a status for the unit of work; and 

5 a sequence number, 
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1 18. (Original) A method according to claim 16, wherein the operations of putting messages onto 

2 a shared queue and retrieving messages from a shared queue aire performed under transactional 

3 scope such that a message which is put is only available to resource managers other that the 

4 resource manager putting the message after commitment of the put operation and a message 

5 which is retrieved is only available to the retrieving resource manager after commitment of the 

6 retrieval oppxatinn, and wherein said stored unit of work descriptors identify each of the 

7 following: 

8 units of work that were uncommitted but for which a decision to commit had been, made 

9 when the failure occurred; 

1 0 units of work Lbai were uncommitted but for which a decision to abort had been made 

1 1 when the failure occurred; and 

1 2 units of work for which no commit or abort decision had been made when the failure 

1 3 occurred; 

1 4 and wherein recovering the identified units of work comprises: 

1 5 committing message put and retrieval operations for which a decision to commit had been 

16 made; 

1 7 backing out message put and retrieval operations for which a decision to back out had 

18 been made; and 

1 9 backing out message put and message letrieval operations for which no commit or abort 

20 decision had been made. 
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1 19. (Original) A distributed data processing system including: 

2 a plurality of resource managers; 

3 a shared access resource including data storage structures to which the resource managers 

4 connect to send and receive communications to and from remote resource managers, the shared 

5 access resource including: 

6 means for storing, within a first data storage structure of the shared resource, unit of work 

7 descriptors for operations performed in relation to said shared resource by the resource managers 

8 in said plurality; and 

9 means for sending a notification of a connection failure between a second data storage structure 

10 of the shared resource and a first resource manager within said plurality, the notification being 

1 1 sent to the remaining resource managers within the plurality which 

12 are connected to the second data storage structure; 

13 wherein said remaining resource managers include: 

14 . means for accessing said first data storage structure and analysing the unit of work 

1 5 descriptors to identify the units of work relating to the second data storage structure that were 

1 6 being performed by the first resource manager when the connection failure occurred; and 

1 7 means for recovering the identified units of work. 

1 20. (Original) A computer program product comprising program code recorded on a machine- 

2 readable recording medium, the program code comprising the following set of components: 
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3 a plural ity of resource managers; 

4 a shared access resource manager including program code for managing storage and 

5 retrieval of data within data storage structures to which the resource managers connect to send 

6 and receive communications to and from remote resource managers, the shared access resource 

7 manager including: 

8 means for storing, within a first data storage structure of the shared resource* unit of work 

9 descriptors for operations performed in relation to said shared resource by the resource managers 

10 in said plurality; and 

1 1 means for sending a notification of a connection failure between a second data storage 

12 structure of the shared resource and a first resource manager within, said plurality, the notification 

1 3 being sent to the remaining resource managers within the plurality which are connected to the 

1 4 second data storage structure; 

1 5 wherein said remaining resource managers include: 

16 means for accessing said first data storage structure and analysing the unit of work 

1 7 descriptors to identify the units of work relating to the second data storage structure that were 

1 8 being performed by the first resource manager when the connection failure occurred; and 

1 9 means for recovering the identified units of work* 
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